Creating a remote browser (init'ing the first contentparent) shouldn't have to read (and, for new profiles, write) plugin data on the main thread
Categories
(Core Graveyard :: Plug-ins, defect, P3)
Tracking
(firefox68 wontfix, firefox72 fixed)
People
(Reporter: Gijs, Assigned: Gijs)
References
(Blocks 2 open bugs)
Details
(Keywords: main-thread-io, perf, Whiteboard: [fxperf:p2] [fxperfsize:L])
Attachments
(5 files)
Salient bit of stack:
nsSubDocumentFrame::ShowViewer() [xul.dll]
nsFrameLoader::Show(int,int,int,int,nsSubDocumentFrame *) [xul.dll]
nsFrameLoader::ShowRemoteFrame []
nsFrameLoader::ShowRemoteFrame(mozilla::gfx::IntSizeTyped<mozilla::ScreenPixel> const &,nsSubDocumentFrame *) [xul.dll]
nsFrameLoader::TryRemoteBrowser:Create []
nsFrameLoader::TryRemoteBrowser() [xul.dll]
ContentParent::CreateBrowser []
mozilla::dom::ContentParent::CreateBrowser(mozilla::dom::TabContext const &,mozilla::dom::Element *,mozilla::dom::ContentParent *,mozilla::dom::TabParent *,unsigned __int64) [xul.dll]
mozilla::dom::ContentParent::GetNewOrUsedBrowserProcess(mozilla::dom::Element *,nsTSubstring<char16_t> const &,mozilla::hal::ProcessPriority,mozilla::dom::ContentParent *,bool) [xul.dll]
ContentParent::LaunchSubprocess []
void mozilla::dom::ContentParent::LaunchSubprocessInternal(mozilla::hal::ProcessPriority, class mozilla::Variant<bool *,RefPtr<mozilla::MozPromise<RefPtr<mozilla::dom::ContentParent>,mozilla::ipc::GeckoChildProcessHost::LaunchError,0> > *> *) [xul.dll]
ContentParent::LaunchSubprocess::resolve []
static class RefPtr<mozilla::MozPromise<RefPtr<mozilla::dom::ContentParent>,mozilla::ipc::GeckoChildProcessHost::LaunchError,0> > mozilla::dom::ContentParent::LaunchSubprocessInternal::<unnamed-tag>::operator()(void *) [xul.dll]
void mozilla::dom::ContentParent::InitInternal(mozilla::hal::ProcessPriority) [xul.dll]
nsPluginHost::GetInst() [xul.dll]
void nsPluginHost::nsPluginHost() [xul.dll]
nsresult nsPluginHost::LoadPlugins() [xul.dll]
nsresult nsPluginHost::FindPlugins(bool, bool *) [xul.dll]
nsresult nsPluginHost::ReadPluginInfo() [xul.dll]
By default, we only support flash, and by default that's click to play, I think. If we can, we should avoid reading this information (as well as the plugin dll files etc.) early on the main thread.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
All of these accesses in the IO test Florian is working on look related to this code:
{path: "ProfD:pluginreg.dat",
platform: "win",
stat: 1}, //
{path: "ProfD:pluginreg.dat.tmp",
stat: 1,
write: 64,
close: 1},
{path: "ProfD:plugins",
platform: "win",
stat: 1},
{path: "APlugns:",
platform: "win",
stat: 1},
{path: "UserPlugins.parent:",
platform: "win",
stat: 1},
{path: "UserPlugins:",
platform: "win",
stat: 1},
{path: "ProfD:plugins/nptest.dll",
platform: "win",
stat: 1},
{path: "ProfD:plugins/npsecondtest.dll",
platform: "win",
stat: 1},
{path: "ProfD:plugins/npthirdtest.dll",
platform: "win",
stat: 1},
{path: "ProfD:plugins/npswftest.dll",
platform: "win",
stat: 1},
Updated•6 years ago
|
Comment 2•6 years ago
|
||
The priority flag is not set for this bug.
:jimm, could you have a look please?
For more information, please visit auto_nag documentation.
![]() |
||
Updated•6 years ago
|
Comment 3•6 years ago
|
||
So I looked at this a little bit today.
It looks like we do this so that the content process has a sense of what plugins are available (via an array of nsPluginTags) on the system to load before it loads any pages. Before, I think each content process used to send a synchronous IPC message to fetch this information.
It feels like, especially these days, a lot of this could maybe be simplified. All of this nsPluginTag stuff seems superfluous when mostly what we care about is:
- Is Flash available?
- Is Flash set to Never Activate?
Peering under the hood, it looks like the Never Activate / Click-to-Activate state is stored (or at least, mirrored) in preferences, so this is something that doesn't really need to be re-sent to the content process:
Since prefs are already mirrored between the two processes then, it sounds like what the parent needs to tell the content process is whether or not the plugin is available.
Re-reading https://bugzilla.mozilla.org/show_bug.cgi?id=1371888#c6, it looks like there was originally a plan to do the following:
- After start-up, scan for Flash off of the main thread
- Once the scan is done, update all content processes letting them know if Flash exists
- If a content process needs plugin information before the parent has finished the scan, grossly block the content process until the scan is complete, in the hopes that this is super uncommon.
Assignee | ||
Comment 4•6 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) (Catching up from PTO) from comment #3)
So I looked at this a little bit today.
Jinx! :-)
It looks like we do this so that the content process has a sense of what plugins are available (via an array of nsPluginTags) on the system to load before it loads any pages. Before, I think each content process used to send a synchronous IPC message to fetch this information.
It feels like, especially these days, a lot of this could maybe be simplified. All of this nsPluginTag stuff seems superfluous when mostly what we care about is:
- Is Flash available?
- Is Flash set to Never Activate?
Unfortunately, for unit tests, we still support the non-flash test plugins that are in the tree. When I worked on bug 1519434, the request was to keep all of that the same.
Chris, could we stop supporting non-flash plugins even for testing? We have a test plugin that professes to "be" flash that we can continue using, I think (but we would remove support for non-click-to-play plugins entirely).
The nsPluginTag infrastructure is still used to correlate plugin click to play warnings between parent and child processes. Once we actually stop supporting non-flash plugins, that shouldn't be necessary anymore, because there should be only 1 plugin installed at any time.
Peering under the hood, it looks like the Never Activate / Click-to-Activate state is stored (or at least, mirrored) in preferences, so this is something that doesn't really need to be re-sent to the content process:
Since prefs are already mirrored between the two processes then, it sounds like what the parent needs to tell the content process is whether or not the plugin is available.
Re-reading https://bugzilla.mozilla.org/show_bug.cgi?id=1371888#c6, it looks like there was originally a plan to do the following:
- After start-up, scan for Flash off of the main thread
- Once the scan is done, update all content processes letting them know if Flash exists
- If a content process needs plugin information before the parent has finished the scan, grossly block the content process until the scan is complete, in the hopes that this is super uncommon.
I suspect that despite flash's decline, enumerating navigator.plugins is probably still used for fingerprinting, and we list flash in navigator.plugins even when it's ask-to-activate (which is the default, if it's installed). So the information "have we got flash" is probably still needed for some stuff, and blocking the process sounds pretty gross... Unfortunately it doesn't seem like we have use counters for navigator.plugins, so unsure how to test my fingerprinting hypothesis...
I was thinking we could do the following:
- cache "have we got flash" information in a pref (and/or fold it into one of the existing prefs).
- have the content process rely on that pref before we finish the async scan thing
This would potentially break things on initial pageload if flash was installed since the browser was last started, but otherwise flash would work. Chris/Mike, how does that sound?
Comment 5•6 years ago
|
||
Ok, so boiling it down:
- Remove all support for non-Flash plugins, and remove support for plugins that aren't click-to-play.
- Create a new pref to represent whether or not Flash is installed, defaulted to false (this might have to be more than a boolean pref though, since
navigator.plugins["Shockwave Flash"]
wants to know other things about the plugin...) - On start-up, asynchronously scan the system for Flash, and if it's found, set the pref to be "true" (which will get mirrored down to the content processes automatically). Again, maybe this is more than just "true", and includes the necessary details for
navigator.plugins["Shockwave Flash"]
- Anytime someone tries to read
navigator.plugins["Shockwave Flash"]
, send them the contents of the pref - even if the scan is incomplete.
If that's the plan, then it sounds good to me.
I'll note that enumerating navigator.plugins
is disabled in Firefox, and has been since Firefox 29 (see 1).
Comment 6•6 years ago
|
||
Chris, could we stop supporting non-flash plugins even for testing? We have a test plugin that professes to "be" flash that we can continue using, I think (but we would remove support for non-click-to-play plugins entirely).
SGTM
- Remove all support for non-Flash plugins, and remove support for plugins that aren't click-to-play.
btw, Google's Widevine CDM and Cisco's OpenH264 decoder are listed as "Plugins" in about:addons, even though they are not NPAPI plugins (and don't show up in about:plugins). Will they be affect by any of these remote browser changes?
Widevine and OpenH264 are downloaded shortly (~1 minute?) after first run (and the remote downloads are probably disabled in tests), so we need to handle the case where Widevine or OpenH264 are not installed when the browser is launched but suddenly appear later in the session.
- Create a new pref to represent whether or not Flash is installed, defaulted to false (this might have to be more than a boolean pref though, since navigator.plugins["Shockwave Flash"] wants to know other things about the plugin...)
- On start-up, asynchronously scan the system for Flash, and if it's found, set the pref to be "true" (which will get mirrored down to the content processes automatically).
Would the "true" pref value be saved to disk so subsequent sessions are more likely to have the correct "true" answer before the asynchronous scan?
Comment 7•6 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #6)
btw, Google's Widevine CDM and Cisco's OpenH264 decoder are listed as "Plugins" in about:addons, even though they are not NPAPI plugins (and don't show up in about:plugins). Will they be affect by any of these remote browser changes?
I don't believe so, no. They're thrown into the same bucket in about:addons, but I believe the underlying infrastructure is very much distinct. nsPluginTag's, for example, don't appear to be used for GMP's.
Widevine and OpenH264 are downloaded shortly (~1 minute?) after first run (and the remote downloads are probably disabled in tests), so we need to handle the case where Widevine or OpenH264 are not installed when the browser is launched but suddenly appear later in the session.
It looks like we already have code in place to handle that separately - it looks like the MediaKeys backend code will fire a "cdm-not-installed" notification, which our browser UI picks up, and displays a message to the user saying: "Firefox is installing components needed to play the audio or video on this page. Please try again later."
Once the CDMs finish downloading, it looks like the internal state updates occur through a completely separate series of components[1], so I don't think this work will impact CDMs at all.
- Create a new pref to represent whether or not Flash is installed, defaulted to false (this might have to be more than a boolean pref though, since navigator.plugins["Shockwave Flash"] wants to know other things about the plugin...)
- On start-up, asynchronously scan the system for Flash, and if it's found, set the pref to be "true" (which will get mirrored down to the content processes automatically).
Would the "true" pref value be saved to disk so subsequent sessions are more likely to have the correct "true" answer before the asynchronous scan?
Yep!
[1]: The GMPInstallManager tells the GMPProvider that the install completes, which then adds the GMP directory to the internal list managed by the GMPService, which then updates all of the content processes itself.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 8•6 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) (Catching up from PTO) from comment #5)
I'll note that enumerating
navigator.plugins
is disabled in Firefox, and has been since Firefox 29 (see 1).
FWIW, when I try to use navigator.plugins from the console, it seems to work, but maybe the console is special?
Assignee | ||
Comment 9•6 years ago
|
||
By storing the plugin information in prefs when only flash is allowed, we can
avoid reading pluginreg and doing a plugin scan on the mainthread on startup.
As part of this, we're now keeping track of the 'is flash allowed' pref on the
plugin host, and no longer write 'valid' plugin info into pluginreg if so.
Assignee | ||
Comment 10•6 years ago
|
||
In this change we:
- stop treating the nsPluginDirServiceProvider as a directory provider, as its
GetFile implementation was a no-op anyway - registering it didn't make any
difference. - stop treating it as a class entirely, because the PLID getters were already
static, so instantiating it also didn't do anything. - move IO from the plugin directory list provider and the Windows-only PLID
getters into nsPluginHost. This enables us to move it off of the main thread
later - the directory getting has to happen on the main thread, but we can
postpone further checks on the nsIFile instances. - in the process, stop doing exists() calls on files because we can fail more
lazily. This allows us to remove more allowlist entries from
browser_startup_mainthreadio, though theisDirectory
calls will actually
still cause IO - they don't seem to create IO markers in the profiler.
We will move this IO away from the main thread in subsequent commits.
Depends on D48328
Assignee | ||
Comment 11•6 years ago
|
||
Remove:
- a list of allowed mimetypes; we only support flash anyway.
- writing to disk when a plugin's enabled state changes; the plugin's enabled
state is not kept on disk so there's no point. - tracking which plugins should load in the parent as no plugins should do so
if e10s is on.
Depends on D48329
Assignee | ||
Comment 12•6 years ago
|
||
This moves plugin finding logic into a separate class (PluginFinder).
PluginFinder is a runnable. After this commit, there are two ways in which it
can be used:
- to actually find and load plugins.
- to check if there have been any changes to plugins.
Although it is a runnable, at this point we still invoke its Run method on the
main thread, so all that's happening is we're separating the "look for plugins
on disk" bits out from everything else.
The goal is to be able to run the IO-intensive FindPlugins (including reading
and writing pluginreg) away from the main thread.
Depends on D48330
Assignee | ||
Comment 13•6 years ago
|
||
Finally, let's move the actual IO away from the main thread.
This means there are now 3 ways of looking for plugins:
- looking for changes from ReloadPlugins. This runs the PluginFinder runnable
on the main thread. - loading plugins from LoadPlugins. This will:
a) first check prefs and report the flash plugin based on that information,
if the prefs indicate it exists (using the callback provided by
nsPluginHost).
b) then hopefully dispatch to a background thread, where it will read
pluginreg.dat, scan the appropriate folders on disk, and see if
anything changed. Once done, it sets mFinishedFinding to true and
re-dispatches itself to the main thread.
c) then on the main thread, it reports any changes to nsPluginHost. - if dispatching in 2(b) fails, we will run steps (b) and (c) on the main
thread.
Note: if ReloadPlugins is called, we intiially do (1), but if we find
changes, we clear out the set of known plugins and then run LoadPlugins
again (meaning we go through 2 (or 3 if 2(b) fails)). This is how
reloading plugins worked prior to my changes and I've attempted not to
change it.
In order for this to work, there are some other changes in this commit:
- the sandbox prefs are being read "early" and cached for flash vs
"everything else". We can't access prefs on non-main threads without
using StaticPrefs, which doesn't seem worth it here. - some of the plugin tag classes are moved to threadsafe refcounting.
This is a bit unfortunate, but because they're instantiated on a non-
mainthread, and then later used on the main thread, despite the
fact that the architecture means nothing tries to touch them from
more than one thread at once, without threadsafe refcounting we hit
asserts in debug mode if we add references to them back on the main thread. - we add shutdown blocking for pluginfinding. We don't really want to
be halfway through finding plugins and then trying to shut them down,
or re-instantiating plugins after they've been unloaded. - we keep a reference to the "pending" pluginfinder instance while
doing lookups away from the main thread (ie (2)), to avoid re-entrancy or
trying to write to pluginreg while we're reading it somewhere else,
etc. If there's an attempt to do more plugin finding while this is
ongoing, we flip mDoReloadOnceFindingFinished and do a reload once
our initial lookups are complete.
Depends on D48331
Assignee | ||
Comment 14•6 years ago
|
||
This is now 90% there, I think. The latest trypush still has 2 failures that I'm not sure about (and a few that are already fixed in the patches I've uploaded):
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf7a756024538e9541d2b71e096b5612f8e0d632&selectedJob=270284966 - GECKO(6140) | Assertion failure: false (MOZ_ASSERT_UNREACHABLE: expecting a spawned plugin), at z:/build/build/src/dom/base/nsObjectLoadingContent.cpp:503 - I don't really know what this assertion means or how to debug it. :handyman, do you have any hints about where to go looking?
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf7a756024538e9541d2b71e096b5612f8e0d632&selectedJob=270292026 - GECKO(4472) | SUMMARY: AddressSanitizer: heap-use-after-free z:\build\build\src\dom\plugins\ipc\PluginModuleParent.cpp:1395 in mozilla::plugins::PluginModuleChromeParent::ActorDestroy(enum mozilla::ipc::IProtocol::ActorDestroyReason) - seems we're free'ing the
mPlugin
on the plugintag twice? I'm not really sure how this is possible - there's a guard to ensure we don't try to free a nullmPlugin
so I don't understand how it can be null when we try to free it. Here too I would appreciate tips, though I'll keep digging before I request review on the last cset...
Assignee | ||
Comment 15•6 years ago
|
||
Trypush with the latest: https://treeherder.mozilla.org/#/jobs?repo=try&revision=14b5bc526cddf7a9e13257106fa8be45fec325f8
Comment 16•6 years ago
•
|
||
I don't know how much help this will be but...
(In reply to :Gijs (he/him) from comment #14)
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf7a756024538e9541d2b71e096b5612f8e0d632&selectedJob=270284966 - GECKO(6140) | Assertion failure: false (MOZ_ASSERT_UNREACHABLE: expecting a spawned plugin), at z:/build/build/src/dom/base/nsObjectLoadingContent.cpp:503 - I don't really know what this assertion means or how to debug it. :handyman, do you have any hints about where to go looking?
I really don't. I was able to reproduce this once with your patches and, after that, it always ran fine. I figure it's timing related but all of the relevant nsObjectLoadingContent stuff is (when successful) happening on the main thread, as a response to some message. The ASSERT itself is just saying that nsObjectLoadingContent::InstantiatePluginInstance hasn't set the owner yet or the owner failed to initialize [1]. The behavior I see when I run the test is that InstantiatePluginInstance succeeds (with the two reentrant calls that immediately exit), which sets the owner, and later we get the HttpChannelChild::RecvOnStartRequest IPDL call, and MakePluginListener uses the new owner (where the test now fails).
This might be an effect of all of the 'intr' IPDL messages used in plugin initialization. Sometimes (under conditions I don't fully understand), an intr message will process other IPDL messages while it waits for a response (its... intrruptable). That's not visibly happening here -- your failing stack doesn't have a message handler in a random place -- but maybe it happened before the failure?
Alternatively, it could be an unexpected task scheduling, like the next one...
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf7a756024538e9541d2b71e096b5612f8e0d632&selectedJob=270292026 - GECKO(4472) | SUMMARY: AddressSanitizer: heap-use-after-free z:\build\build\src\dom\plugins\ipc\PluginModuleParent.cpp:1395 in mozilla::plugins::PluginModuleChromeParent::ActorDestroy(enum mozilla::ipc::IProtocol::ActorDestroyReason) - seems we're free'ing the
mPlugin
on the plugintag twice? I'm not really sure how this is possible - there's a guard to ensure we don't try to free a nullmPlugin
so I don't understand how it can be null when we try to free it. Here too I would appreciate tips, though I'll keep digging before I request review on the last cset...
I haven't dug in too deep but the double-free is only mysterious because PluginModuleChromeParent is a PluginLibrary and so it gets desstroyed as part of the nsNPAPIPlugin object [2]. You can see a number of weird coincidences in your failing stacks, but I think the issue is your call to nsPluginHost::ClearNonRunningPlugins in nsPluginHost::LoadPlugins. This is part of the stack that actually did the delete:
[task 2019-10-08T14:57:13.419Z] 14:57:13 INFO - GECKO(4472) | #0 0x7ffcb4a84520 in free Z:\task_1569588313\fetches\llvm-project\llvm\projects\compiler-rt\lib\asan\asan_malloc_win.cc:53
[task 2019-10-08T14:57:13.643Z] 14:57:13 INFO - GECKO(4472) | #1 0x7ffc93d4eaf6 in [thunk]:mozilla::plugins::PluginModuleChromeParent::`vector deleting destructor'`adjustor{760}' (unsigned int) (Z:\task_1570544296\build\application\firefox\xul.dll+0x189fbeaf6)
[task 2019-10-08T14:57:13.645Z] 14:57:13 INFO - GECKO(4472) | #2 0x7ffc93c21100 in nsNPAPIPlugin::~nsNPAPIPlugin(void) z:\build\build\src\dom\plugins\base\nsNPAPIPlugin.cpp:188
[task 2019-10-08T14:57:13.645Z] 14:57:13 INFO - GECKO(4472) | #3 0x7ffc93c692b2 in nsPluginTag::TryUnloadPlugin(bool) z:\build\build\src\dom\plugins\base\nsPluginTags.cpp:578
[task 2019-10-08T14:57:13.645Z] 14:57:13 INFO - GECKO(4472) | #4 0x7ffc93c3597a in nsPluginHost::ClearNonRunningPlugins(void) z:\build\build\src\dom\plugins\base\nsPluginHost.cpp:447
[task 2019-10-08T14:57:13.647Z] 14:57:13 INFO - GECKO(4472) | #5 0x7ffc93c56e4a in std::_Func_impl_no_alloc<`lambda at z:/build/build/src/dom/plugins/base/nsPluginHost.cpp:1915:7',void,bool,RefPtr<nsPluginTag>,nsTArray<mozilla::Pair<bool,RefPtr<nsPluginTag> > > &>::_Do_call z:\build\build\src\vs2017_15.8.4\VC\include\functional:1227
[task 2019-10-08T14:57:13.647Z] 14:57:13 INFO - GECKO(4472) | #6 0x7ffc93c5e3a2 in PluginFinder::Run(void) z:\build\build\src\dom\plugins\base\PluginFinder.cpp:223
[task 2019-10-08T14:57:13.647Z] 14:57:13 INFO - GECKO(4472) | #7 0x7ffc8a0d31c9 in nsThread::ProcessNextEvent(bool,bool *) z:\build\build\src\xpcom\threads\nsThread.cpp:1225
[task 2019-10-08T14:57:13.649Z] 14:57:13 INFO - GECKO(4472) | #8 0x7ffc8a0dc728 in NS_ProcessNextEvent(class nsIThread *,bool) z:\build\build\src\xpcom\threads\nsThreadUtils.cpp:486
[task 2019-10-08T14:57:13.649Z] 14:57:13 INFO - GECKO(4472) | #9 0x7ffc8a0d0e1d in nsThread::Shutdown(void) z:\build\build\src\xpcom\threads\nsThread.cpp:900
[task 2019-10-08T14:57:13.650Z] 14:57:13 INFO - GECKO(4472) | #10 0x7ffc93d0d3c2 in mozilla::plugins::FunctionBrokerParent::~FunctionBrokerParent(void) z:\build\build\src\dom\plugins\ipc\FunctionBrokerParent.cpp:54
[task 2019-10-08T14:57:13.650Z] 14:57:13 INFO - GECKO(4472) | #11 0x7ffc93d4e2ff in mozilla::plugins::FunctionBrokerParent::`scalar deleting destructor'(unsigned int) z:\build\build\src\dom\plugins\ipc\FunctionBrokerParent.cpp:48
[task 2019-10-08T14:57:13.650Z] 14:57:13 INFO - GECKO(4472) | #12 0x7ffc93d0de7e in mozilla::plugins::FunctionBrokerParent::Destroy(class mozilla::plugins::FunctionBrokerParent *) z:\build\build\src\dom\plugins\ipc\FunctionBrokerParent.cpp:89
[task 2019-10-08T14:57:13.651Z] 14:57:13 INFO - GECKO(4472) | #13 0x7ffc93d40ea2 in mozilla::plugins::PluginModuleChromeParent::ActorDestroy(enum mozilla::ipc::IProtocol::ActorDestroyReason) z:\build\build\src\dom\plugins\ipc\PluginModuleParent.cpp:1394
[task 2019-10-08T14:57:13.651Z] 14:57:13 INFO - GECKO(4472) | #14 0x7ffc8b33abaa in mozilla::ipc::IProtocol::DestroySubtree(enum mozilla::ipc::IProtocol::ActorDestroyReason) z:\build\build\src\ipc\glue\ProtocolUtils.cpp:572
[task 2019-10-08T14:57:13.652Z] 14:57:13 INFO - GECKO(4472) | #15 0x7ffc8b9cce79 in mozilla::plugins::PPluginModuleParent::OnChannelClose(void) z:\build\build\src\obj-firefox\ipc\ipdl\PPluginModuleParent.cpp:1620
[task 2019-10-08T14:57:13.652Z] 14:57:13 INFO - GECKO(4472) | #16 0x7ffc93d44d34 in mozilla::plugins::PluginModuleParent::NP_Shutdown(short *) z:\build\build\src\dom\plugins\ipc\PluginModuleParent.cpp:1909
[task 2019-10-08T14:57:13.653Z] 14:57:13 INFO - GECKO(4472) | #17 0x7ffc93c22101 in nsNPAPIPlugin::Shutdown(void) z:\build\build\src\dom\plugins\base\nsNPAPIPlugin.cpp:293
[task 2019-10-08T14:57:13.653Z] 14:57:13 INFO - GECKO(4472) | #18 0x7ffc93c6925d in nsPluginTag::TryUnloadPlugin(bool) z:\build\build\src\dom\plugins\base\nsPluginTags.cpp:577
[task 2019-10-08T14:57:13.654Z] 14:57:13 INFO - GECKO(4472) | #19 0x7ffc93c545c6 in nsPluginUnloadRunnable::Run(void) z:\build\build\src\dom\plugins\base\nsPluginHost.cpp:1247
[task 2019-10-08T14:57:13.654Z] 14:57:13 INFO - GECKO(4472) | #20 0x7ffc8a0d31c9 in nsThread::ProcessNextEvent(bool,bool *) z:\build\build\src\xpcom\threads\nsThread.cpp:1225
[task 2019-10-08T14:57:13.655Z] 14:57:13 INFO - GECKO(4472) | #21 0x7ffc8a0dc728 in NS_ProcessNextEvent(class nsIThread *,bool) z:\build\build\src\xpcom\threads\nsThreadUtils.cpp:486
[task 2019-10-08T14:57:13.655Z] 14:57:13 INFO - GECKO(4472) | #22 0x7ffc8b32dccf in mozilla::ipc::MessagePump::Run(class base::MessagePump::Delegate *) z:\build\build\src\ipc\glue\MessagePump.cpp:88
so while destroying the FunctionBrokerParent when unloading one plugin, an nsThread::Shutdown call is randomly running tasks, one of which is a PluginFinder::Run that calls your ClearNonRunningPlugins.
[1] https://searchfox.org/mozilla-central/source/dom/base/nsObjectLoadingContent.cpp#736
[2] https://searchfox.org/mozilla-central/source/dom/plugins/base/nsNPAPIPlugin.cpp#188
Comment 17•6 years ago
|
||
PS: I don't know how widely known this is but we plan to remove plugin support (and code) in 2020... in case that affects your prioritization.
Assignee | ||
Comment 18•6 years ago
|
||
(In reply to David Parks (dparks) [:handyman] from comment #17)
PS: I don't know how widely known this is but we plan to remove plugin support (and code) in 2020... in case that affects your prioritization.
Yep, I'm aware. I think there's a sufficient amount of mainthread IO here, early on startup, that isn't necessary (esp. for more than half of our users, who no longer even have flash, per https://data.firefox.com/dashboard/hardware#plugins ) that we should do something. And we haven't committed to when in 2020 we're ending support... Plus, I'm hoping I'm close - comment #16 definitely helps, I'll see if I can figure out the last few issues so I can request review on the last commit, too.
Assignee | ||
Comment 19•6 years ago
|
||
OK, I figured out the issue with the first test (and, I suspect, likely the second, too), which is that at some point when refactoring or shuffling stuff into different checks, I missed ensuring that we read/use the pref-based flash storage when checking for plugin changes without creating a new list of plugins from scratch.
This meant that whenever we checked for changes when in flash only mode (as we are in those failing tests, but not in a lot of the plugin tests because they rely on non-flash plugins...), we found changes (vs. cached info, which was missing the flash info...) every time we checked. And it turns out we check a lot, and if we find changes, we throw out non-running plugins and load plugins from scratch...
So, obviously I needed to add back a ReadFlashInfo
call to the codepath where we're "just" checking for changes. That makes the problem go away when running this test. What concerns me though is why this led to triggering this race condition - I'd rather also fix the race condition, rather than "only" making it harder to hit. So still digging into that...
Assignee | ||
Comment 20•6 years ago
|
||
Right, so the sequence of events is something like this:
- load plugins in parent, send to content, everything happy
- something triggers a reload of plugin info. There is lots of code that does this, so it's not particularly surprising, though obviously the bug described in comment #19 helps with reproducing because we keep finding changes so we keep actually reloading plugins, rather than "just" scanning some info on disk and going "nothing changed, whatever".
- the parent synchronously throws away the old plugin tags which aren't loaded, and starts loading the new ones. The initial addition of flash happens synchronously (from data cached in prefs)
- meanwhile, the content process decides to instantiate a plugin. It has async-updated lists of the tags in the parent. Those tags are identified by monotonically increasing numerical IDs.
- before the content process is updated, it asks the parent to instantiate a plugin via the PluginModuleParent bridge thingummyjig. To do this, it sends the ID that it has for the flash plugin ***
- but the parent cannot find that ID, because it has already reloaded. We get into
nsPluginHost::GetPluginForContentProcess
andPluginForID()
simply returns null. Now things go pearshapes because we cannot load a library for a plugin we have no idea about.
Assuming I'm getting this right, this is a fairly straightforward data race given that (some) communication between parent and child is async - there's no guarantee that the child has the "right" plugin IDs and the parent hasn't just been told to refresh them and actually found something. I think that in principle it could be triggered today, before my changes, if there were a change to plugins while Firefox is up. This doesn't even need to be a material change; just adding a non-plugin file in the directories we scan would trip this, I think. It requires bad luck, of course... Needinfo for this - I assume that I'm OK not to fix this race while I'm here (though I should obviously fix the issue in comment #19 that triggers wiping+re-adding plugins all. the. time.).
I'm tempted to suggest we just completely stop supporting updating/installing/uninstalling plugins while Firefox is up. Certainly Adobe's windows uninstaller will refuse to continue running until you shut down Firefox... I don't know if it also does this when updating it... - and I guess it's technically scope creep for this bug.
*** stack:
0:12.19 GECKO(10828) #01: nsPluginHost::EnsurePluginLoaded (d:\mozilla-central\dom\plugins\base\nsPluginHost.cpp:1194)
0:12.19 GECKO(10828) #02: nsPluginHost::GetPlugin (d:\mozilla-central\dom\plugins\base\nsPluginHost.cpp:1302)
0:12.19 GECKO(10828) #03: nsPluginHost::TrySetUpPluginInstance (d:\mozilla-central\dom\plugins\base\nsPluginHost.cpp:849)
0:12.19 GECKO(10828) #04: nsPluginHost::SetUpPluginInstance (d:\mozilla-central\dom\plugins\base\nsPluginHost.cpp:795)
0:12.19 GECKO(10828) #05: nsPluginHost::InstantiatePluginInstance (d:\mozilla-central\dom\plugins\base\nsPluginHost.cpp:737)
0:12.20 GECKO(10828) #06: nsObjectLoadingContent::InstantiatePluginInstance (d:\mozilla-central\dom\base\nsObjectLoadingContent.cpp:710)
0:12.20 GECKO(10828) #07: nsObjectLoadingContent::LoadObject (d:\mozilla-central\dom\base\nsObjectLoadingContent.cpp:2211)
0:12.20 GECKO(10828) #08: nsObjectLoadingContent::OnStartRequest (d:\mozilla-central\dom\base\nsObjectLoadingContent.cpp:1014)
0:12.23 GECKO(10828) #09: mozilla::net::HttpChannelChild::DoOnStartRequest (d:\mozilla-central\netwerk\protocol\http\HttpChannelChild.cpp:682)
0:12.23 GECKO(10828) #10: mozilla::net::HttpChannelChild::OnStartRequest (d:\mozilla-central\netwerk\protocol\http\HttpChannelChild.cpp:608)
Assignee | ||
Comment 21•6 years ago
|
||
Comment 22•6 years ago
|
||
Assuming I'm getting this right, this is a fairly straightforward data race given that (some) communication between parent and child is async - there's no guarantee that the child has the "right" plugin IDs and the parent hasn't just been told to refresh them and actually found something. I think that in principle it could be triggered today, before my changes, if there were a change to plugins while Firefox is up. This doesn't even need to be a material change; just adding a non-plugin file in the directories we scan would trip this, I think. It requires bad luck, of course... Needinfo for this - I assume that I'm OK not to fix this race while I'm here (though I should obviously fix the issue in comment #19 that triggers wiping+re-adding plugins all. the. time.).
Huh. Good catch. I don't know of any mechanism in the code base that is intended to handle this data race so you are probably right that it's there now. You can ignore that case. There are, as you point out, way too many ways to inspire a plugin registry reload (through privileged JS, through the plugin, and more) but the only interesting case I can see would be if a user installed or updated Flash and somehow then tried to use it before the IDs were updated in content, which should always happen asynchronously (apparently) but quickly. This seems very far fetched and just ends in the plugin not starting but presumably the ID would be updated on a page reload. Also, many ways of reloading go through content so they synchronize with the parent automatically. Still, its not ideal but plugin initialization is the 9th circle of concurrency hell. There are subtle bugs. Changes are hard to validate (which is why my reviews are taking so long -- sorry bout that).
I'm tempted to suggest we just completely stop supporting updating/installing/uninstalling plugins while Firefox is up. Certainly Adobe's windows uninstaller will refuse to continue running until you shut down Firefox... I don't know if it also does this when updating it... - and I guess it's technically scope creep for this bug.
I think this was the general idea but it isn't the reality. We certainly don't enforce it. My own experience with Comcast's Xfinity streaming service is that I rarely use it but its the only page that uses Flash that I frequent. When I do, I often find that my Flash is out of date. I can usually update it without restarting Firefox.
I just did this:
- Ran Flash uninstaller and rebooted.
- Installed version 32.0.0.207 (old -- newest is 270 -- don't be fooled by anagrams) and confirmed use with about:plugins.
- Ran the updater for version 32.0.0.270. At the end it told me to restart my browser and I got the "Firefox is currently running. Close Firefox?" modal we put up -- the installer wanted to open a web site. I canceled that.
- In the browser (no restart), go to https://helpx.adobe.com/flash-player.html and run the Check. It shows version 32.0.0.270 -- the latest.
- Confirm that latest is running in about:plugins. (Note: for whatever reason, you have to go to a web site with a plugin to reload. Hence step 4.
So you can update without restart if you don't have Flash running. And I think this is probably somewhat common because Flash is so rare. I would not mess with that.
Comment 23•6 years ago
|
||
Comment 24•6 years ago
|
||
Backed out 5 changesets (bug 1545123) for causing nsPluginTags.cpp build bustages
push that caused the backout: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&revision=91313cceae8c54aedd2b177ae14b10269e8e1678&selectedJob=274212546
backout: https://hg.mozilla.org/integration/autoland/rev/b4197fcbd41190d40e4eec3ab4c2e270d80c7424
Assignee | ||
Comment 25•6 years ago
|
||
Huh, weird, I pushed the same stuff to try on Monday and it compiled fine then... https://treeherder.mozilla.org/#/jobs?repo=try&revision=75f78e5ce52e92d9726bc25a70687e3719a3a26d . Looking.
Assignee | ||
Comment 26•6 years ago
|
||
Oh, so only specific types of linux builds (not the "normal" ones we use for test runs) failed...
Attempt to fix and get these builds to run on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3d4332b9afc09429f85ec1f4ce1640f641f535f
There was also a pgo run failure, but retriggers succeeded, there are other similar failures elsewhere in the autoland recent pushes (cf. https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=274156969&searchStr=linux%2Crun&revision=b19afbec384a1e3cdee81102f4cd4e89992bab49 ) and sadly it seems we do not get crash stacks or minidumps for crashes during profiling (which in fact is probably something that should be fixed, I'll file a bug). So let's hope that was a fluke...
Assignee | ||
Comment 27•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #26)
sadly it seems we do not get crash stacks or minidumps for crashes during profiling (which in fact is probably something that should be fixed, I'll file a bug).
Filed bug 1593465.
Assignee | ||
Comment 28•6 years ago
|
||
Try was green (on the same job that failed on autoland), so let's try again.
Comment 29•6 years ago
|
||
Comment 30•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/dc85aaeb9ee9
https://hg.mozilla.org/mozilla-central/rev/1facb4403c96
https://hg.mozilla.org/mozilla-central/rev/ccb230067cd9
https://hg.mozilla.org/mozilla-central/rev/ddaad03e2e35
https://hg.mozilla.org/mozilla-central/rev/16d21676bcb9
Updated•6 years ago
|
Updated•3 years ago
|
Description
•